Hedge cold container-metadata reads across regions (#4253)#4608
Hedge cold container-metadata reads across regions (#4253)#4608NaluTripician wants to merge 1 commit into
Conversation
A cold container/collection metadata read (DocumentCollection point read) that warms the container cache could be routed to a slow or unhealthy preferred region and stall the read -- and any operation blocked on it -- past the caller's timeout. If the caller's future was dropped (Rust cancellation) during the control-plane timeout escalation, the inline cross-region failover was preempted and no attempt was made against the next region (issue Azure#4253; same class as dotnet #5805). Rather than keeping the read alive past caller-drop with a detached task (which conflicts with the crate's structural, no-detached-tasks model), this widens the cross-region hedging eligibility to cover DocumentCollection point reads. Once the hedge threshold elapses, the read is speculatively dispatched to a second preferred region in parallel, so a healthy region returns within the caller's deadline -- structurally removing the single-region-stall / cancellation- preemption window. This is the metadata-read phase the eligibility comment already anticipated; no other change to should_hedge is required. Scope matches the origin issue: only the container/collection cache read is covered. Partition-key-range reads are intentionally excluded (the .NET issue notes the pk-range cache is unaffected), and container writes / feed reads stay ineligible (idempotent point reads only). Adds tests: container read is hedged; container write is not; container ReadFeed is not. Fixes Azure#4253 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the Cosmos driver’s existing structural cross-region hedging to cover cold container/collection metadata point reads so a slow/unhealthy preferred region can’t stall cache warm-up past the caller’s timeout (addressing #4253) without introducing detached background tasks.
Changes:
- Expanded hedging eligibility to include
ResourceType::DocumentCollection(in addition toDocument). - Added hedging-eligibility regression tests ensuring container point reads hedge, while container writes and feed reads do not.
- Added a driver changelog entry describing the behavioral fix.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/hedging_eligibility.rs | Widens hedge-eligible resource types to include container metadata reads and adds targeted tests for container read/write/feed behavior. |
| sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md | Documents the bug fix in the unreleased changelog section. |
|
|
||
| ### Bugs Fixed | ||
|
|
||
| - Extended cross-region hedging to cover the container/collection metadata read (`DocumentCollection` point reads). A cold container-cache warm-up routed to a slow or unhealthy preferred region could stall the read — and any operation blocked on it — past the caller's timeout, with the cross-region failover never reached if the caller gave up first. Hedging now speculatively dispatches the read to a second preferred region once the hedge threshold elapses, so a healthy region returns within the caller's deadline. This uses the existing structural hedging mechanism (no detached background tasks) and applies only to idempotent reads; container writes and feed reads are unaffected. ([#4253](https://github.com/Azure/azure-sdk-for-rust/issues/4253)) |
There was a problem hiding this comment.
Agreed with copilot - also the issue is being referenced, not the PR link
NaluTripician
left a comment
There was a problem hiding this comment.
Review — looks good, two non-blocking follow-ups
Correct and tightly scoped. Verified against the PR-head source:
- Gating is sound.
should_hedgeadmits an op only whenresource_type ∈ {Document, DocumentCollection}andis_read_only()andoperation_type ∈ {Read}. So a container point read qualifies, whilecreate_container(Create) andread_all_containers(ReadFeed) are correctly excluded. - Constructor→type mappings match the tests (
cosmos_operation.rs):read_container→Read+DocumentCollection,create_container→Create,read_all_containers→ReadFeed. Theassert_eq!(resource_type(), DocumentCollection)lines guard against constructor drift silently invalidating a test — nice. - Not inert.
should_hedge/evaluate_hedge_eligibilityare consulted on the metadata pipeline path, and the data-plane hub-region latch already stays off for metadata ops (region_latch_none_on_multi_master_or_metadata), so the widening genuinely activates hedging for the #4253 scenario.
Follow-ups (non-blocking)
1. Spec drift. HEDGING_SPEC.md still tracks metadata-read hedging as deferred Phase 2 ("design pass required before scheduling"), and the capability matrix shows it as not-yet-shipped. This PR ships the Container subset, so the design doc now contradicts the code. Please update the spec to move container/collection metadata reads from deferred → shipped.
2. Resolve the spec's own open question. The Phase-2 section explicitly asks: "Hedged metadata reads must not produce stale-cache races when one region returns an older view than another; decide whether to prefer the latest _etag/resource id or the fastest response." This PR effectively answers fastest-wins (default hedge behavior). I think that's safe for the container cache — the partition-key definition we warm on is immutable without a recreate, and staleness is bounded by replication lag + cache TTL (no worse than a single-region read) — but since the spec flagged it, let's record that reasoning rather than resolve it silently.
simorenoh
left a comment
There was a problem hiding this comment.
LGTM, just two nits on comments/ changelog
| /// past the caller's timeout, which is the cross-region-failover-preemption | ||
| /// scenario in issue #4253. Both are idempotent reads. |
There was a problem hiding this comment.
We probably don't need to mention the relevant issue here
|
|
||
| ### Bugs Fixed | ||
|
|
||
| - Extended cross-region hedging to cover the container/collection metadata read (`DocumentCollection` point reads). A cold container-cache warm-up routed to a slow or unhealthy preferred region could stall the read — and any operation blocked on it — past the caller's timeout, with the cross-region failover never reached if the caller gave up first. Hedging now speculatively dispatches the read to a second preferred region once the hedge threshold elapses, so a healthy region returns within the caller's deadline. This uses the existing structural hedging mechanism (no detached background tasks) and applies only to idempotent reads; container writes and feed reads are unaffected. ([#4253](https://github.com/Azure/azure-sdk-for-rust/issues/4253)) |
There was a problem hiding this comment.
Agreed with copilot - also the issue is being referenced, not the PR link
Summary
Reworks the fix for #4253 to use the driver's structural cross-region hedging instead of a detached background task (the prior approach, in #4607, conflicted with the crate's stated "no detached tasks" model). Closing #4607 in favor of this.
A cold container/collection metadata read (
DocumentCollectionpoint read) that warms the container cache could be routed to a slow or unhealthy preferred region and stall the read — and any operation blocked on warming the cache — past the caller's timeout. If the caller's future is dropped (Rust cancellation) during the control-plane timeout escalation, the inline cross-region failover is preempted and the next region is never tried.Fix
Widen
HEDGEABLE_RESOURCE_TYPESfrom[Document]to[Document, DocumentCollection]inhedging_eligibility.rs. Container point reads (OperationType::Read+ResourceType::DocumentCollection) become hedge-eligible, so once the hedge threshold elapses the read is speculatively dispatched to a second preferred region in parallel and a healthy region returns within the caller's deadline — structurally removing the single-region-stall / cancellation-preemption window, with no detached tasks. This is exactly the metadata-read phase the eligibility comment already anticipated; the rest of the hedging machinery (evaluate_hedge_eligibility->should_hedge->execute_hedged) already handlesPipelineType::Metadata, so no other change is required.Scope (matches the origin issue)
ReadFeedis also not hedge-eligible.)Safety
Hedging a container read issues at most one extra idempotent GET to another region, and only after the threshold (default
min(1000ms, request_timeout/2)) — so there's no duplicate-side-effect risk and no cost on the fast path. All existing hedge invariants (result classification, session tokens, PPCB/PPAF feedback, diagnostics) apply unchanged; the data-plane-only hub-region latch correctly stays off for metadata ops.Tests
Adds: container read is hedged; container write is not; container
ReadFeedis not.cargo clippy --all-targetsclean,cargo fmt --checkclean, full driver lib suite 1811 passed / 0 failed (37 hedging-eligibility tests including the 3 new).Fixes #4253